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Abstract 

There has been considerable recent interest in distribution-tests whose run-time and sample 
requirements are sublinear in the domain-size k. We study two of the most important tests 
under the conditional-sampling model where each query specifies a subset S of the domain, and 
the response is a sample drawn from S according to the underlying distribution. 

For identity testing, which asks whether the underlying distribution equals a specific given 
distribution or e-differs from it, we reduce the known time and sample complexities from D(e“^) 
to thereby matching the information theoretic lower bound. For closeness testing, which 

asks whether two distributions underlying observed data sets are^equal or different, we reduce 
existing complexity from log® k) to an even sub-logarithmic D(e“® log log k) thus providing 

a better bound to an open problem in Bertinoro Workshop on Sublinear Algorithms [Fisher, 
2014]. 

Keywords: Property testing, conditional sampling, sublinear algorithms 


1 Introduction 

1.1 Background 

The question of whether two probability distributions are the same or substantially different arises 
in many important applications. We consider two variations of this problem: identity testing where 
one distribution is known while the other is revealed only via its samples, and closeness testing 
where both distributions are revealed only via their samples. 

As its name suggests, identity testing arises when an identity needs to be verified. For example, 
testing whether a given person generated an observed fingerprint, if a specific author wrote an 
unattributed document, or if a certain disease caused the symptoms experienced by a patient. In 
all these cases we may have sufficient information to accurately infer the true identity’s underlying 
distribution, and ask whether this distribution also generated newly-observed samples. For example, 
multiple original high-quality fingerprints can be used to infer the fingerprint structure, and then 
be applied to decide whether it generated newly-observed fingerprints. 


1 


Closeness testing arises when we try to discern whether the same entity generated two different 
data sets. For example, if two fingerprints were generated by the same individual, two documents 
were written by the same author, or two patients suffer from the same disease. In these cases, we 
do not know the distribution underlying each data set, but would still like to determine whether 
they were generated by the same distribution or by two different ones. 

Both problems have been studied extensively. In the hypothesis-testing framework, researchers 
studied the asymptotic test error as the number of samples tends to infinity, [see Ziv, 1988, 
Unnikrishnan, 2012, and references therein]. We will follow a more recent, non-asymptotic ap¬ 
proach. Two distributions p and q are e-far if 

\\p-q\\i > e- 

An identity test for a given distribution p considers independent samples from an unknown dis¬ 
tribution q and declares either q = p or they are e-far. The test’s error probability is the highest 
probability that it errs, maximized over q = p and every q that is e-far from p. Note if p and q are 
neither same nor e-far, namely if 0 < \\q-p\\i < e, neither answer constitutes an error. 

Let Nif^{k,e,6) be the smallest number of samples to identity test every fc-element distribution 
with error probability < 6. It can be shown that the sample complexity depends on 6 mildly, 
Ni^{k,e,6) < 0{N[^{k,e, 0.1)) ■ log|. Hence we focus on Ni^{k,e,0.1), denoting it by Ni^{k,e). 

This formulation was introduced by Goldreich and Ron [2000] who, motivated by testing graph 
expansion, considered identity testing of uniform distributions. Paninski [2008] showed that the 
sample complexity of identity testing for the uniform distributions is Q{e~^Vk). General iden¬ 
tity testing was studied by Batu et al. [2001] who showed that N[^{k,e) < 0{e~'^y/k), and re¬ 
cently Valiant and Valiant [2013] proved a matching lower bound, implying that N[^{k, e) = Q{e~'^Vk), 
where O and later 0 and 12, hide multiplicative logarithmic factors. 

Similarly, a closeness test takes independent samples from p and q and declares them either to 
be the same or e-far. The test’s error probability is the highest probability that it errs, maximized 
over q = p and every p and q that are e-far. Let Nci{k, e, <5) be the smallest number of samples that 
suffice to closeness test every two fc-element distributions with error probability < 6. Here too it 

def 

suffices to consider Vci(A:, e) = Vci(A:, e, 0.1). ^ 

Closeness testing was first studied by Batu et al. [2000] who showed that Nci{k, e) < 0{e~^k'^^^). 
Recently Valiant [2011], Chan et al. [2014b] showed that Nc\{k,e) = Q{max{e~^^^k‘^/^,e~‘^Vk)). 

1.2 Alternative models 

The problem’s elegance, intrinsic interest, and potential applications have led several researchers to 
consider scenarios where fewer samples may suffice. Monotone, log-concave, and m-modal distribu¬ 
tions were considered in Rubinfeld and Servedio [2009], Daskalakis et al. [2013], Diakonikolas et al. 
[2015], Chan et al. [2014a], and their sample complexity was shown to decline from a polynomial in 
k to a polynomial in log k. For example, identity testing of monotone distributions over k elements 
requires 0{e~^f‘^yJ\og k) samples, and identity testing log-concave distributions over k elements 
requires samples, independent of the support size k. 

A competitive framework that analyzes the optimality for every pair of distributions was consid¬ 
ered in Acharya et al. [2012], Valiant and Valiant [2013]. Other related scenarios include classifica¬ 
tion [Acharya et ah, 2012], outlier detection [Acharya et ah, 2014b], testing collections of distribu¬ 
tions [Levi et ah, 2013], testing for the class of monotone distributions [Batu et ah, 2004], testing for 
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the class of Poisson Binomial distributions [Acharya and Daskalakis, 2015], testing under different 
distance measures [Guha et ah, 2009, Waggoner, 2015]. 

Another direction lowered the sample complexity of all distributions by considering more power¬ 
ful queries. Perhaps the most natural is the conditional-sampling model introduced independently 
in Chakraborty et al. [2013] and Canonne et al. [2014], where instead of obtaining samples from 
the entire support set, each query specifies a query set S C [k] and the samples are then selected 
from S in proportion to their original probability, namely element i is selected with probability 


Ps{i) 



i G S, 
otherwise. 


where p{S) is the probability of set S under p. Conditional sampling is a natural extension of 
sampling, and Chakraborty et al. [2013] describes several scenarios where it may arise. Note that 
unlike other works in distribution testing, conditional sampling algorithms can be adaptive, i.e., 
each query set can depend on previous queries and observed samples. It is similar in spirit to the 
machine learning’s popular active testing paradigm, where additional information is interactively 
requested for specific domain elements. Balcan et al. [2012] showed that various problems such 
as testing unions of intervals, testing linear separators benefit significantly from the active testing 
model. 

Let e) and N*^{k, e) be the number of samples required for identity- and closeness-testing 

under conditional sampling model. For identity testing, Canonne et al. [2014] showed that condi¬ 
tional sampling eliminates the dependence on k, 

n{e-^)<N*^{k,e)<d{e-^). 


For closeness testing, the same paper showed that 

N:,{k,e)<die-Hog^k). 

Chakraborty et al. [2013] showed that N*^{k,e) < poly(log* A;, e“^) and designed a poly(log fc, e“^) 
algorithm for testing any label-invariant property. They also derived a n(-^log log k) lower bound 
for testing any label-invariant property. 

An open problem posed by Fisher [2014] asked the sample complexity of closeness testing under 
conditional sampling which was partly answered by Acharya et al. [2014a], who showed 

^ci(^> 1/4) > ll(\/loglog k). 


1.3 New results 

Our first result resolves the sample complexity of identity testing with conditional sampling. For 
identity testing we show that 

N*^{k,e)<d{e-^). 

Along with the information-theoretic lower bound above, this yields 

N*^{k,e) = e{e-^). 

For closeness testing, we address the open problem of Fisher [2014] by reducing the upper bound 
from log^ k to log log k. We show that 

N*^{k,e) < d{e-^\og\ogk). 
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This very mild, double-logarithmic dependence on the alphabet size may be the first sub-poly- 
logarithmic growth rate of any non-constant-complexity property and together with the lower bound 
in Acharya et al. [2014a] shows that the dependence on k is indeed a poly-double-logarithmic. 

Rest of the paper is organized as follows. We first study identity testing in Section 2. In 
Section 3 we propose an algorithm for closeness testing. All the proofs are given in Appendix. 


2 Identity testing 

def 

In the following, p is a distribution over [k] = {!,... ,A;}, p{i) is the probability of f G [fc], |5| is 
the cardinality of S' C [k], ps is the conditional distribution of p when S is queried, and n is the 
number of samples. For an element i, n{i) is used to denote the number of occurrences of i. 

This section is organized as follows. We first motivate our identity test using restricted uni¬ 
formity testing, a special case of identity testing. We then highlight two important aspects of our 
identity test: finding a distinguishing element i and finding a distinguishing set S. We then provide 
a simple algorithm for finding a distinguishing element. As we show, finding distinguishing sets are 
easy for testing near-uniform distributions and we give an algorithm for testing near-uniform dis¬ 
tributions. We later use the near-uniform case as a subroutine for testing any general distribution. 


2.1 Example: restricted uniformity testing 

Consider the class of distributions Q, where each q £ Q has k/2 elements with probability (l-|-e)/A:, 
and k/2 elements with probability (1 — €)/k. Let p be the uniform distribution, namely p{i) = 1/k 
for all 1 < i < A:. Hence for every q £ Q, \\p — q\\i = e. 

We now motivate our test via a simpler restricted uniformity testing, a special case of identity 
testing where one determines if a distribution is p or if it belongs to the class Q. 

If we know two elements i,j such that q{i) = > i = P(^) dU) = ^ i ~ p(j)j if 

suffices to consider the set S = {i, j}. For this set 


Ps(i) 


P(i) 

P(^) +P(j) 


Ps(J) 


P(j) 

P(i) +P(j) 


1/k _ 1 
^ ~ 2 ’ 


while 

+ ^ l+£ 

g(i) + g(j) (l + e)/k + (l-e)/k 2 ’ 

and similarly qs(j) = (1 — e)/2. Thus differentiating between ps and qs is same as differentiating 
between B(l/2) and il((l -|- e)/2) for which a simple application of the Chernoff bound shows that 
0(e~^) samples suffice. Thus the sample complexity is 0(e“^) if we knew such a set S. 

Next consider the same class of distributions Q, but without the knowledge of elements i and 
j. We can pick two elements uniformly at random from all possible ( 2 ) pairs. With probability 
> 1/2, the two elements will have different probabilities as above, and again we could determine 
whether root the distribution is uniform. Our success probability is half the success probability 
when S is known, but it can be increased by repeating the experiment several times and declaring 
the distribution to be non-uniform if one of the choices of i and j indicates non-uniformity. 

While the above example illustrates tests for uniform distribution, for non-uniform distributions 
finding elements i,j can be difficult. Instead of finding pairs of elements, we find a distinguishing 
element i and a distinguishing set S such that q(i) < p{i) ^ p{S) < q{S), thus when conditional 
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samples from S U {i} are observed, the number of times i appears would differ significantly, and 
one can use Chernoff-type arguments to differentiate between same and diff. While previous 
authors have used similar methods, our main contribution is to design a information theoretically 
near-optimal identity test. 

Before we proceed to identity testing, we quantify the Chernoff-type arguments formally us¬ 
ing Test-equal. It takes samples from two unknown binary distributions p, q (without loss of 
generality assume over { 0 , 1 }), error probability 6, and a parameter e and it tests if p = g or 

, , T > e. We use the chi-squared distance , , - 7 as the measure of distance instead 

of l\ since it captures the dependence on sample complexity more accurately. For example, consider 
two scenarios: p, g = i?(l/2), B(l/2 -|- e/2) or p,q = B{0), B{e/2). In both cases ||p — g||;^ = e, but 
the number of samples required to distinguish p and q in the first case is 0 (e“^), while in the second 
case 0{e~^) suffice. However, chi-squared distance correctly captures the sample complexity as in 
the first case it is O(e^) and in the second case it is 0{e). While several other simple hypothesis 
tests exist, the algorithm below has near-optimal sample complexity in terms of e, 5. 


Algorithm Test-equal 

Input: chi-squared bound e, error 6, distributions B{p) and B{q). 

Parameters: n = 0(l/e). 

Repeat 18 log | times and output the majority: 

1. Let n' = poi(n) and n" = poi(re) be two independent Poisson variables with mean n. 

2. Draw samples xi,X 2 ■ ■ ■ Xn' from the first distribution and pi, P 2 • • • Un" from the second one. 

3. Let ni = Xi and n 2 = Yd=i Vi- 

4- If + il?+n»-n7X-l - f _ 

Lemma 1 (Appendix B.l). If p = q, then Test-equal outputs same with probability 1 — 5. 
// outputs diff with probability > 1 — d. Furthermore the algorithm uses 

0(i • log I) samples. 

2.2 Finding a distinguishing element i 

We now give an algorithm to find an element i such that p{i) > qii). In the above mentioned 
example, we could find such an element with probability > 1/2, by randomly selecting i out of all 
elements. However, for some distributions, this probability is much lower. For example consider 
the following distributions p and q. p(l) = e/2, p(2) = 0, p{i) = > 2, and g(l) = 0, 

q{2) = e/2, q{i) = for i > 2. Again note that ||p — g||i = e. If we pick i at random, the chance 
that p{i) > q[i) is 1/A:, very small for our purpose. A better way of selecting i would be sampling 
according to p itself. For example, the probability of finding an element i such that p{i) > qii) 
when sampled from p is e/2 3> 1/A:. 

We quantify the above idea next by using the following simple algorithm that picks elements 
such that p(i) > qii). We first need the following definition. Without loss of generality assume 
that the elements are ordered such that p(l) > p(2) > p(3)... > p(A:). 

Definition 2. For a distribution p, element i is a-heavy, if 
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As we show in proofs, symbols that are heavy (a large) can be used as distinguishing symbols 
easily and hence our goal is to choose symbols such that p{i) > q{i) and i is a-heavy for a large 
value of a. To this end, first consider an auxiliary result that shows if for some non-negative 
values Oj, > 0, then the following sampling algorithm will pick an element Xi such that 

Xi is aj-heavy and > /3j. While several other algorithms have similar properties, the following 
algorithm achieves a good trade-between a and (3 (one of the tuples satisfy a/3 = f^(l)), hence is 
useful in achieving near-optimal sample complexity. 


Algorithm FiND-ELEMENT 
Input: Parameter e, distribution p. 

Parameters: m = 16/e, /3j = je/8, aj = l/(4j log(16/e)). 

1. Draw m independent samples xi,X 2 ■ ■ ■ Xm from p. 

2. Output tuples (xi,/3i,ai), (x2,/32,a2) , . . . , (Xrri) Pmj ^m)‘ 


Lemma 3 (Appendix B.2). For 1 < i < k, let a* be such that 0 < a* < 2. 

then with probability > 1/5, at least one tuple {x,a,l3) returned by FiND-ELEMENT(e,p) satisfy the 

property that x is a-heavy and > /3. Furthermore it uses 16/e samples. 

We now use the above lemma to pick elements such that p{i) > q{i). Since ||p — > e, 

^ {p{i) - q{i)) > e/2. 

v.pii)>q(i) 


Hence 


''^^p{i) max 
i 


V pw J 



Applying Lemma 3 with Oj = max 


n phl-gh) 
p(i) 


, yields 


Lemma 4. If\\p — q\\i'> e, then with probability > 1/5 at least one of the tuple {i,/3,a) returned 
by FiND-ELEMENT(e,p) satisfies p{i) — q{i) > f3p{i) and i is a-heavy. Furthermore Find-element 
uses 16/e samples. 


Note that even though the above algorithm does not use distribution q, it finds i such that 
p(*) “ q{^) ^ fip{i) just by the properties of distance. Furthermore, fij increases with j and 
aj decreases with thus the above lemma states that the algorithm finds an element i such that 
either {p{i) — q{i))/p{i) is large, but may not be heavy, or (p(z) — q{i))/p{i) is small, yet it belongs 
to one of the higher probabilities. This precise trade-off becomes important to bound the sample 
complexity. 


2.3 Testing for near-uniform distributions 

We define a distribution p to be near-uniform if maxjp(f) < 2minjp(z). Recall that we need to 
find a distinguishing element and a distinguishing set. As we show, for near-uniform distributions, 
there are singleton distinguishing sets and hence are easy to find. Using Find-element, we first 
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define a meta algorithm to test for near-uniform distributions. The inputs to the algorithm are 
parameter e, error 5, distributions p, q and an element y such that p{y) > q{y). Since we use Near- 
uniform-identity-test as a subroutine later, y is given from the main algorithm. However, if we 
want to use Near-uniform-identity-test by itself, we can find a y using Find-element(€,p). 

The algorithm uses Find-element to find an element x such that q{x) — p{x) > (iq{x). Since 
p{y) > q{y) and q{x) — p{x) > /3q{x), running Test-equal on set {x,y} will yield an algorithm 
for identity testing. The precise bounds in Lemmas 1 and 3 help us to obtain the optimal sample 
complexity. In particular, 

Lemma 5 (Appendix B.3). If p = q, then Near-uniform-identity-test returns same with 
probability >1 — 5. If p is near-uniform and \\p — > e, then Near-uniform-identity-test 

returns diff with probability >1/5 — 5. The algorithm uses O • log samples. 


Algorithm Near-uniform-identity-test 

Input: distance e, error 6, distributions p,q, an element y such that p{y) > q{y). 

1. Run FiND-ELEMENT(e, g) to obtain tuples {xj,(3j,aj) for 1 < j < 16/e. 

2. For every tuple {xj,Pj,aj), run Test-eqval{/ 3]/U4:,66/{7r^ f),p{^^y},q{^^y}). 

3. Output same if Test-equal in previous step returns same for all tuples, otherwise output 
diff. 


2.4 Finding a distinguishing set for general distributions 

We now extend Near-uniform-identity-test to general distributions. Recall that we need to 
find a distinguishing element and a distinguishing set. 

Once we have an element such that p{i) > q{i), our objective is to find a distinguishing set 
S such that p{S) < q{S) and p{S) « p(i). Natural candidates for such sets are combinations 
of elements whose probabilities < p{i). Since p is known, we can select such sets easily. Let 
Gi = {/ '-3 ^ *}■ Consider the sets Hi, H 2 , ■ ■ ■ formed by combining elements in Gi such that 
p{i) < p{Hj) < 2p{i),\/j. We ideally would like to use one of these HjS as S, however depending 
on the values of p{Hj) three possible scenarios arise and that constitutes the main algorithm. 

We need one more definition for describing the main identity test. For any distribution p, and 
a partition of S into disjoint subsets S = {S'!, S 2 ,..}, the induced distribution p^ is a distribution 
over 5i, S 2 , ■ ■ ■ such that Vi, 

2.5 Proposed identity test 

The algorithm is a combination of tests for each possible scenarios. First it finds a set of tuples 
{i,j3,a) such that one tuple satishes (p(i) — q{i))/p{i) > /3 and i is a-heavy. Then, it divides Gi 
into Hi, H 2 ,... such that , Vj, p{i) < p{Hj) < 2p{i). If ||p — g||^ > e, then there are three possible 
cases. 

1. p{Hj){l — (I/2) < q{Hj) for most js. We can randomly pick a set Hj and sample from Hj[j{i} 
and we would be able to test if ||p — g||^ > e using n{i) when sampled from Hj U {i}. 
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2. — f3/2) > q{Hj) for most j. Since for most j’s, — f3/2) > q{Hj), we have 

p{Gi){l — /3/2) > q{Gi), and since p{Gi) > a, we can sample from the entire distribution and 
use n{Gi) to test if ||p — g||^ > e. 

3. For some j, p{Hj){l — (3j2) > q{Hj) and for some j, p{Hj){l — /3/2) < q{Hj). It can be shown 
that this condition implies that elements in Gi can be grouped into Hi, H2, ■ ■ ■ such that 
induced distribution on groups is near-uniform and yet the ii distance between the induced 
distributions is large. We use Near-uniform-identity-test for this scenario. 

The algorithm has a step corresponding to each of the above three scenarios. If p = q, then all 
three steps would output same with high probability, otherwise one of the steps would output diff. 
The main result of this section is to bound the sample complexity of Identity-test 

Theorem 6 (Appendix B.4). If p = q, then Identity-test returns same with probability > 1 — d 
and if ||p —(zHi > e, then Identity-test returns diff with probability > 1/30. The algorithm 
uses at most e) < 0 • log^ ^ • log samples. 

The proposed identity testing has different error probabilities when p = q and ||p — > e. 

In particular, if p = q, the algorithm returns same with probability > 1 — d and if ||p — > e 

it outputs diff with probability > 1/30. While the probability of success for ||p —g||i > e is 
small, it can be boosted arbitrarily close to 1, by repeating the algorithm 0{\og{l/6)) times and 
testing if more than 1/60 fraction of times the algorithm outputs diff. By a simple Chernoff type 
argument, it can be shown that for both cases p = q and ||p — (?||i, the error probability of the 
boosted algorithm is < 6. Furthermore, throughout the paper we have calculated all the constants 
except sample complexities which we have left in O notation. 


Algorithm Identity-test 

Input: error 6, distance e an unknown distribution q, and a known distribution p. 

1. Run Find-element {e,p) to obtain tuples {x,(3,a). 

2. For every tuple {x,l3,a): 

(a) Let Gx = {y : y > x}. 

(b) Partition Gx into groups H = Hi,H 2 , ■ ■ .s.t. for each group Hj, p{x) < p{Hj) < 2p{x). 

(c) Take a random sample y from p^^ and run Test-equal( , ^,P{x,y},q{x,y}) ■ 

(d) Run Test-equal((^)^, 

(e) Run Near-uniform-identity-test (|, 

3. Output diff if any of the above tests returns diff for any tuple, otherwise output same. 


3 Closeness testing 

Recall that in closeness testing, both p and q are unknown and we test if p = q or \\p — q\\.^^ > e 
using samples. First we relate identity testing to closeness testing. 



Identity testing had two parts: finding a distinguishing element i and a distinguishing set S. 
The algorithm we used to generate i did not use any a priori knowledge of the distribution. Hence it 
carries over to closeness testing easily. The main difficulty of extending identity testing to closeness 
testing is to find a distinguishing set. Recall that in identity testing, we ordered elements such that 
their probabilities are decreasing and considered set Gi = {j '■ j > i} to find a distinguishing set. 
Gi was known in identity testing, however in closeness testing, it is unknown and is difficult to find. 

The rest of the section is organized as follows: We first outline a method of identifying a 
distinguishing set by sampling at a certain frequency (which is unknown). We then formalize 
finding a distinguishing element and then show how one can use a binary search to find the sampling 
frequency and a distinguishing set. We finally describe our main closeness test, which requires few 
additional techniques to handle some special cases. 

3.1 Outline for finding a distinguishing set 

Recall that in identity testing, we ordered elements such that their probabilities are decreasing 
and considered Gi = {j : j > f}. We then used a subset of 5 C G* such that p{S) ^ p{i) as the 
distinguishing set. However, in closeness test this is not possible as set Gi is unknown. We now 
outline a method of finding such a set S using random sampling without the knowledge of Gi. 

Without loss of generality, assume that elements are ordered such that p(l)+( 7 (l) > p{2)+q{2) > 

... > p{k) + q{k). The algorithm does not use this fact and the assumption is for the ease of proof 
notation. Let Gi = {j : j > i} under this ordering {Gi serves same purpose as Gi for identity 
testing, however is symmetric with respect to p, q and hence easy to handle compared to that of 
identity testing). Furthermore, for simplicity in the rest of the section, assume that p{i) > q{i) and 
p{Gi) < q{Gi). Suppose we come up with a scheme that finds subset S of Gi such that p{S) « p{i) 
and p{S) < q{S), then as in Identity-test, we can use that scheme together with Test-equal 
on S' U {f} to differentiate between p = q and | |p — ^| |]^ > e. 

The main challenge of the algorithm is to find a distinguishing subset of Gi. Let r = {p + q)/2, 
i.e., r{j) = {p{j) + q{j))/2 V1 < j < A:. Suppose we know ro = Consider a set S formed by 

including each element j independently with probability tq. Thus the probability of that set can 
be written as 

k 

PiS) = J2h&sPij), 
i=i 

where Ijgg is the indicator random variable for j £ S. In any such set S, there might be elements 
that are not from Gi. We can prune these elements (refer to them as j') by sampling from the 
distribution P{jj'} and testing if j' appeared more than j. Precise probabilistic arguments are given 
later. Suppose we remove all elements in S that are not in Gi. Then, 

PiS) = liGSP(j)- 

j^Gi 

Since Pr(Ijes = 1) = ro, 

^\piS)] = E[Ij65]p(j) =roY P{j) = 

j^Gi j^Gi 

Similarly one can show that £[( 7 ( 5 )] = Thus E[p(5)] < E[g(5)] and E[p(5)]-|-E[g(S')] = 

p{i) + q{i). Note that for efficiently using Test-equal, we not only need p{i) > q{i) and E[p(5)] < 
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]E[( 7 ( 5 )], but we the chi-squared distance needs to be large. It can be shown that this condition is 
same as stating p{S) -|- q{S) « p{i) + q{i) is necessary and hence E[p(S')] -|- E[g(5)] = p{i) -|- q{i) is 
useful. 

Thus in expectation S' is a good candidate for distinguishing set. Hence if we take samples from 
S U {i} and compare p{S),p{i) and q{S),q{i), we can test if p = q or ||p — gll;^ > e. 

We therefore have to find an i such that p{i) > q{i) and p{Gi) < q{Gi), estimate r{i)/r{Gi) and 
convert the above expectation argument to a probabilistic one. While the calculations and analysis 
in expectation seem natural, judiciously analyzing the success probability of these events takes a 
fair amount of effort. Furthermore, note that given a conditional sampling access to p and q, one 
can generate a conditional sample from r, by selecting p or q independently with probability 1/2 
and then obtaining a conditional sample from the selected distribution. 


3.2 Finding a distinguishing element i 

We now show that using an algorithm similar to Find-element, we can find an i such that 
(p(i) > q{i) and p{Gi) < q{Gi)) or (p(i) < q{i) and p{Gi) > q{Gi)). To quantify the above 
statement we need the following definition. We define /3-approximability as 

Definition 7. For a pair of distributions p and q, element i is /3-approximable, if 

p{i) - q{i) p{Gi) - q{Gi 


>P- 


p{i) + q{i) p{Gi) + q{Gi) 

As we show later, it is sufficient to consider /3-approximable elements instead of elements with 
p{i) > q{i) and p{Gi) < q{Gi). Thus the first step of our algorithm is to find /3-approximable 
elements. To this end, we show that 

Lemma 8 (Appendix C.l). If\\p — g||;^ > e, then 


E 


p{i) + q{i) 


p{i) - q{i) p{Gi) - q{Gi) 


p{i) + q{i) p{Gi) -t q{Gi 


> 


Hence if we use Find-element for the distribution r = (p + q)/2, then one of the tuples would 


be /3,-approximable for some j5j. Note that with Uj = 


p{i)-q{i) 

p{i)+q{i) 


p{Gi)-q{Gi) 

p{Gi)+q{Gi) 


, 0 < Oj < 2 and 


> e/4. By Lemma 3, Find-element outputs a tuple {i,a,/3) such that i that is a- 
heavy and /3-approximable. Note that although we obtain i and guarantees on Gi, the algorithm 
does not find Gi. 

Lemma 9. With probability > 1/5, of the tuples returned by FiND-ELEMENT(e, r) there exist at 
least one tuple that is both a-heavy and fd-approximable. 


3.3 Approximating 


r(Gi) 


via binary search 

r{i) 


Our next goal is to estimate ro = Ah using samples. It can be easily shown that it is sufficient to 

know up-to a multiplicative factor, say 7 (we later choose 7 = ©(log log log h)). Furthermore 
by the definition of Gj, r{Gi) > r{i) and r{Gi) = ^ — kr{i). Therefore, 


1 ^ 

k - r{Gi 


< 1 , 
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and log k > — log > 0. Approximating up-to a multiplicative factor 7 is the same as 

approximating log up-to an additive factor log 7 . We can thus run our algorithm for 

corresponding to each value of {0, log 7 , 2 log 7 , 3 log 7 ,... , log k}, and if | |p — ( 7 I | > e, at least for 
one value of we output diff. Using carefully chosen thresholds we can also ensure that if 
p = q, the algorithm outputs same always. The sample complexity for the above algorithm is 
« 0(logA:) times the complexity when we know ■ We improve the sample complexity by 
using a better search algorithm over {0, log 7 , 2 log 7 , 3 log 7 ,... , log k}. We develop a comparator 
(step 4 in Binary-search) with the following property: if our guess value rguess > ^ r{G ) outputs 

heavy and if Tguess < ^ outputs light. Using such a comparator, we do a binary search 

and find the right value faster. Recall that binary search over m elements uses logm queries. For 
our problem m = log k and thus our sample complexity is approximately log log k times the sample 
complexity of the case when we know ■ 

However, our comparator cannot identify if we have a good guess i.e., if ^rguess < tguess < 
7 ' Thus, our binary search instead of outputting the value of up-to some approximation 

factor 7 , finds a set of candidates Tguegg, rguess> • • • such that at least one of the rguessS satisfies 


1 r{i) 

ir{Gi) 


<C 

— ' guess 


< 7 


r{i) 

riGi)- 


Hence, for each value of rguess we assume that rguess ~ and run the closeness test. At 

least for one value of rguess we would be correct. The algorithm is given in Binary-search. 

The algorithm Prune-set removes all elements of probability > 4r(z), yet does not remove any 
element of probability < r{i). Since after pruning S only contains elements of probability < 4r(z), 
we show that at some point of the log log A: steps, the algorithm encounters rguess ~ RgU' 


Algorithm Prune-set 
Input: S, e, i, a, m, and 7 . 

Parameters: = 4o^i4iogfc ^ ^ ((log log ^ + log ^ log log i)), na = 

0{\og loglog/c-klog^). 

Repeat ni times: 

Obtain a sample j from rs and sample na times from r{j jj. If n{j) > 3na/4, remove j from set S. 
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Algorithm BlNARY-SEARCH 
Input: Tuple 

Parameters: 7 = 1000 log = O ^ 7 ^ log ^ 

Initialize log rguess = — log y/k. Set low = — log k and high = 0. Do log log k times: 

1. Create a set S by independently keeping elements {1, 2,... , A:} \ {i} each w.p. rguess- 

2. Prune S using Prune-set( 5, e, a, 1,7). 

3. Run ASSISTED-CLOSENESS-TEST(rguess, (b/3,a),7, 

4. Obtain na samples from S U {i}. If n{i) < then output heavy, else output light. 

(a) If output is heavy, update high = log rguess and log rguess = (log rguess + low)/2. 

(b) If output is light, update low = log rguess and log rguess = (log rguess + high)/2. 

5. If any of the Assisted-closeness-tests return diff , then output diff. 


Lemma 10 (Appendix C.2). If i is a-heavy and (3-approximable, then the algorithm Binary- 
search, with probability >1 — 5, reaches rguess such that 

r{i) _ r { Gi ) r{i) ^ ^7 r{i) 

7 “ 7 ■r(G,) -'’^““-/3‘r(G,)' 

Note that due to technical reasons we get an additional 1//3 factor in the upper bound and a 
factor of r { Gi ) in the lower bound. 


3.4 Assisted closeness test 


We now discuss the proposed test, which uses the above value of rguess- As stated before, in 
expectation it would be sufficient to keep elements in the set S with probability rguess and use 
the resulting set S to test for closeness. However, there are two caveats. Firstly, Prune-set can 
remove only elements which are bigger than 4(i), while we can reduce the factor 4 to any number 

> 1, but we can never reduce it to 1 as if there is an element with probability 1-1-5' for sufficiently 
small 5', that element is almost indistinguishable from an element with probability 1 — 5'. Thus 
we need a way of ensuring that elements with probability > r(i) and < 4r(i) do not affect the 
concentration inequalities. 

Secondly, since we have an approximate value of r(i)/r(Gj), the probability that required quan¬ 
tities concentrate is small and we have to repeat it many times to obtain a higher probability of 
success. Our algorithm address both these issues and is given below: 

The algorithm picks m sets and prunes them to ensure that none of the elements has probability 

> 4r(i) and considers two possibilities: there exist many elements j such that j ^ Gi and 


- g(0 

r{i) 


pU) - g(j) 

r(j) 


> /3" (/3" determined later), 


or the number of such elements is small. If it is the first case, the algorithm finds such an element 
j and performs Test-equal over set {i, j}. Otherwise, we show that r(S) « r(i), it concentrates. 
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and with high probability 


P{i) - Q{i) 

r{i) 


PjS) - q{S) 
r{S) 


> /3" (/3" determined later), 


and thus one can sample from S U {i} and use n{i) to test closeness. 

To conclude, the proposed Closeness-test uses Find-element to find a distinguishing ele¬ 
ment i. It then runs BiNARY-SEARCH to approximate r(i)/r{Gi). However since the search does 
not identify if it has found a good estimate of r{i)/r{Gi), for each estimate it runs Assisted- 
CLOSENESS-TEST which uses the distinguishing element i and the estimate of r[i)/r{Gi). The main 
result in this section is the sample complexity of our proposed Closeness-test. 


Theorem 11 (Appendix C.3). If p = q, then Closeness-test returns same with probability 
>1 — 5 and if \\p — gll;^ > e, then Closeness-test returns diff with probability > 1/30. The 
algorithm uses N*i{k,€) < O samples. 

As stated in the previous section, by repeating and taking a majority, the success probability 
can be boosted arbitrarily close to 1. Note that none of the constants or the error probabilities 
have been optimized. Constants for all the parameters except the sample complexities ni,n 2 ,n 3 , 
and 77-4 have been given. 


Algorithm Closeness-test 
Input: e, oracles p,q. 

1. Generate a set of tuples using FiND-ELEMENT(e, r). 

2. For every tuple {i,a,/3), run BiNARY-SEARCH(i,/3, a). 

3. If any of the Binary-search returned diff output diff otherwise output same. 


Algorithm Assisted-closeness-test 
Input: Tguess, tuple (i, /?,«), e, and 5. 

Parameters: P" = m = ^, 774 = 0{j/{a/3)), and 5' = 32 ^(^,+f)iogiogfc - 

1. Create ^i, £'2,... S'm independently by keeping elements {1, 2 ,... , A:} \ {i} each w.p. rguess- 

2. Run PRUNE-SET(S'£,e,7,Q:, 777,7) for 1 < .^ < 777 

3. For each set S do: 

(a) Take 774 samples from 7"5u{i} und for all seen elements j, run Test-equal 
((/ 3")V25,5',P{ij},g{ij}). 

(b) Lets = {{i},S}. Run TEST-EQUAL(^^53:^^^,5',pf^^^.j,gf^^^.j^. 

4. If any of the above tests return diff, output diff. 
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A Tools 


We use the following variation of the Chernoff bound. 

Lemma 12 (Chernoff bound). If Xi, X 2 ■ ■ ■ Xn are distributed aceording to Bernoulli p, then 

Pr ( -P > <5^ < 


Pr 


n 

n 


— p < —6] < e 


—2n5^ 


The following lemma follows from a bound in Acharya et al. [2012] and the fact that y^e ^ is 
bounded for all non-negative values of y. 

Lemma 13 (Acharya et al. [2012]). For two independent Poisson random variables p and p' with 
means A and \' respeetively, 


E 

Var 


{p- pT -p-p' 

p + p' -1 
{p-p'f -p-p' 


p + p' -i 


< 4 


f 1 _ e-A-V 

A + A' V 

(A - AQ^ 

A +A' 


+ c" 


where c is a universal constant. 


B Identity testing proofs 

B.l Proof of Lemma 1 

Let t = poi(n) samples, ni, n 2 , n' - m, n" - n 2 

are all independent Poisson distributions with means np,nq,n{l — p), and re(l — q) respectively. 
Suppose the underlying hypothesis isp = q. By Lemma 13, E[t] = 0 and since , ni,n 2 ,n'—ni,n"—n 2 
are all independent Poisson distributions, variance of t is the sum of variances of each term and 
hence Var(t) < 2c^ for some universal constant c. Thus by Chebyshev’s inequality 

8 c^ 1 

Pr(t > ne/2) < —— < -. 

{ne}^ 3 

Hence by the Chernoff bound 12 , after 18 log ^ repetitions probability that the majority of outputs 
is same is > 1 — 5. Suppose the underlying hypothesis is ^ Then by Lemma 13 

E[(] = (1 - (1 - 


(J) n(p-,r n(p-,f 


>- 


p + q 

2n{p - q)^ 


(b) 

> 


{p + q){2-p-q) 
n{p — g)^ 


2-p-q 

(1 - e-) 


{p + q){2-p-qy 
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(a) from the fact that p + q > (p + q) (p_|_g):2 > %+q — ^ similarly 2 — p — q > e. (b) follows the 
fact that ne > 10. Similarly the variance is 


V»r(t) < -“'>" + 0 ^ + ~ + 


8n{p — qY 


p + q 

Thus again by Chebyshev’s inequality, 
Pr(t < ne/2) < 


2-p-q 


2(Y _|_ _Mpz9)^ 


{p + q){2-p-q) 


(p+q){2-p-q) ^ 32c ^ 32 ^ 1 


/ n(p-qY _ p ^\2 ~ (neY 

Wn+oV2-n-o') 2 > 


ne 


- {p+q){2-p-q) 2 

The last inequality follows when n > ^ lemma follows by the Chernoff bound 

argument as before. 


B.2 Proof of Lemma 3 

Let Aj be the event that ax^ > Pj and Xj is Oj-heavy. Since we choose each tuple j independently 
at time j, events AjS are independent. Hence, 

Pr(uAj) = 1 - Pr(nA5) 

m 

=1 - n Pi-iAj) 

m 

= 1-^{1-Vt{AY) 

1=1 

>l_e-E”LiPr(AT_ 


Let Bj = {i : Oj > /3j}. Since all elements in Bj count towards Aj except the last aj part, 
Pr(^j) > p{Bj) — aj. Thus 


^Pr(Aj) > '^piBj) -'^Oj 

1=1 1=1 1=1 


1=1 


logm 
4 log 16/e 


1=1 


We now show that > 1/2, thus proving that Yl'jLi^^i^^j) + 1/4 and Pr(UjTj) > 1/5. 

Since Yli=iPi'^^i > e/4, 


'^p{i)ai= ^ p{i)ai+ ^ p{i)ai 

i=l i:ai>e/8 i:ai<e/8 

< p{i)ai + e/8. 

i:ai>e/8 
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Thus 


and 


i:ai>e/8 


Y1 

i:ai'>e/% 


8ai 

e 


> 1 / 2 . 


In ^^=iPiBj), each p{i) is counted exactly |_%^J times, thus 


m 

Y^PiBj) > 


B.3 Proof of Lemma 5 

The proof uses the following auxiliary lemma. Let pij P{i,j}{i) = p[^+p{j) denote the probability 
of i under conditional sampling. 


Lemma 14. If 

p{i) - gji) _ p{j) - q{j) 

p{i) + q{i) p{j) + q{j) 

then 


{Pi,j - qi,j? 

{pi,j T %,j)(2 ~ Pi,j ~ Qij) 

> _ e^(p(i) + q{i)f{p{j) + q{j)f _ 

“ 4[p(i)(g(i) + g(j)) + 9(*)(p(i) + p{j))]\p{j){q{i) + q{j)) + q{j){p{i) +pU))] 
y ^2 ipji) + Qi^)ipU) + QU)) 

4{p{i) + q{i) + p{j) + q{j)y 

Proof. Let s{i) = p{i) + q{i) and s(j) = p{j) + q{j). Upon expanding, 


p{i) - q{i) 

p{i)+q{i) 


p{j) - q{j) 

9 

p{i)q{j) -p{j)q{i) 

p{j) + q{j) 


s{i)s{j) 


Furthermore, pij = and similarly Hence, 

(Pij - qi,j? 

{pij T ~ Pi,j ~ Qi,j) 


^_ (p(i)g(j) - q{i)p{3)f _ 

W){q{i) + q{j)) + q{i){p{i) + p{3))][p{j){q{i) + q{j)) + q{j){p{i) +p{j))] 

- ‘ilp(i}(q(i} + q(j)} + q(i}(p(i} + p(j))]lp(j)(q(i} + q(j)) + q(j)(p(i) + pU))] 
W e^s^{i)s^{j) 

- A{s{i) + s{j)Ys{i)s{j) 

4(s(i) + s(j))2- 


( 1 ) 


( 2 ) 
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(a) follows by Equations (1) and (2). (6) follows from < s{i) and < 

s{j)- ■ 

Proof. (Lemma 5) We first show that if p = q, then Near-uniform-identity-test returns same 
with probability >1 — 5. By Lemma 1, Test-equal returns error probability 6j = 65/(7r^j^) for 
the jth tuple and hence by the union bound, the overall error is < Ylj 

If p ^ q, with probability > 1/5, Find-element returns an element x that is a-heavy and 
q{x) — p{x) > /3q{x). For this x,y, since p(y) > q{y), 

q{x) -p{x) _ q{y) - p{y) ^ /3q{x) 

p{x)+q{x) p{y)+q{y) ~ p{x)+q{x)' 

By Lemma 14 the chi-squared distance between P{x,y} and q{x,y} is lower bounded by 

> / Pqjx) y _ {p{x) + q{x)f{p{y) + q{y)f _ 

“ \p{x) +q{x)) ^[p{x){q{x) + q{v)) + q{x){p{x) + p{v))][p{y){q{x) + q{y)) + q{y){p{x) +p(y))] 
(“i_ I3‘^q‘^{x)p‘^{y) _ 

“ ^[p{x){q{x) +p{y)) +q{x){p{x) + p{y))]\p{y){q{x) + p{y)) +p{y){p{x) +p{y))] 

_ I3^q‘^{x)p‘^{y) _ 

“ 4[2p(?/)g(x) + p{x)p{y) + q{x){2p{y) + p{y))]\p{y){q{x) + 2p{x)) + p{y){p{x) + 2p{x))] 

_ P^q^ix)p‘^iy) _ 

“ ^[My)qix) + q{x)p{y) + q{x){2p{y) + piy))]\piy)iqix) + 2q{x)) + p{y){q{x) + 2q{x))] 
id) 

> —. 

- 144 


(a) follows from the fact that p{y) > q{y). p{x) < 2p{y) and p{y) < 2p{x) hence (6). p{x) < q{x) 
implies (c) and (d) follows from numerical simplification. Thus by Lemma 1 algorithm returns 
diff with probability >1 — 5. By the union bound, the total error probability is < | + 5. The 

number of samples used is 16/e for the first step and O ^^log^^ for tuple j. Hence the total 

number of samples used is 


16 

-h 




1=1 



B.4 Proof of Theorem 6 


We state the theorem statement for better readability: If p = q, then Identity-test returns same 
with probability >1 — 5 and if ||p — > e, then Identity-test returns diff with probability 

> 1/30. 


Recall that there are ^ tuples. Also observe that all the three tests inside Identity-test are 
called with error parameter ||. As a result if p = q, Identity-test outputs same with probability 


> 1 - 


eS 

48 


16 

e 


= 1-5. 
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We now show that if | |p — g| > e, then the algorithm outputs dif f with probability > 1 /30. By 
Lemma 4, with probability >1/5 Find-element returns an element x such that p{x)—q{x) > f3p{x) 
and a-heavy. Partition Gx into groups Ti = Hi, H 2 , ■ ■ .s.t. for each group Hj, p{x) < p{Hj) < 2p{x) 
and let Pq^ and be the corresponding induced distributions. There are three possible cases. 
We show that for any q, at least one of the sub-routines in Identity-test will output diff with 
high probability. 

1. \p{Gx)-q{Gx)\>f. 

2. \p{Gx) - q{Gx)\ < ^ and > f. 

3. \p{Gx) - q{Gx)\ < ^ and - QgJIi < §• 

If \p{Gx) — q{Gx)\ > then chi-squared distance between ^nd jg > 

and hence TEST-EQUAL((a/3/5)^, (step 2c) outputs diff with probability 

> 1 - — 

If \p{Gx) — q{Gx)\ < ^ and \ \p^^ — — f ’ then by Lemma 5 Near-uniform-identity- 

test(|, outputs diff with probability > | - || > g- 

If \p{Gx) - q{Gx)\ < ^ and - QgJ\i < 


y&n 


pjy) - qjy) 4 ' 

p{y) 5^. 


< 


p{Gx 


'^piy)^ 


y&n 


p{y) - Q{y) > T^piy) 

5 


< 


p{y) 


4/3 


E 

yen 


Q{y) Qiy) 


y{y) 



p{Gx) p{Gx) q{Gx) 

p{y) q{y) 


q{Gx 


p{Gx) q{Gx) 

\p{Gx) - q{Gx)\ 


+ q{y) 

y&H 


p{Gx) q{Gx) 


+ q{G: 


p{Gx)q{Gx 


(a) follows from triangle inequality, (b) follows from the fact that \\p^^ — ^g^IIi — §• PiGx) > a 
and p{Gx) — q{Gx) < ^ and hence (c). Therefore, for a random sample y from p^ , with probability 
> 1/2, Let = /3' > 1 —^ and furthermore = /3" < 1 —/3. Hence /3' —/3" > 

Thus similar to the proof of Lemma 14, the chi-squared distance between P{x,y} and q{x,y} can be 
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lower bounded by 


^ _ {p{x)q{y) - q{x)p{y)f _ 

“ ^{x){q{x) +q{y)) + q{x){p{x) + p{y))][p{y){q{x) + q{y)) + q{y){p{x) + p{y))] 

(^) (/3' — f5"y‘p^{x)p^{y) 

[p(x)(g(x) + q{y)) + q{x){p{x) + p{y))][p{y){q{x) + q{y)) + q{y){p{x) + p{y))] 

K) (/3^ - /3")^ p{x)p{y) 

~ max2(l,/3',/3") 4(p(x)+p(y))2 
( 6 ) (/ 3 ^ - 

~ 18max2(l,/3',/3") 

- 1800' 

(ai),(a 2 ) follow by substituting q{x) = f5'p{x) and q{y) = /3"p{y). (b) follows from p{x) < 2p{y) 
and p{y) < 2p(x)./3" < 1 and f3' — jS" > ^ and hence the RHS in (6) is minimized by /3' = 1 + ^ and 
j3'' = 1. For these values of /3',/3", max(l,/3',/3") < 2 and hence (c). Thus Test-equal outputs 
diff with probability > 1 — 

If Up — g||i > e, then by Lemma 4 step 1 picks a tuple {x,(3,a) such that p{x) — q{x) > p{x)f3 
with probability at least Conditioned on this event, for the three cases discussed above the 
minimum probability of outputting diff is g and every p,q fa ll s into one of the three categories. 
Hence with probability > ^ Identity-test outputs diff. 

We now compute the sample complexity of Identity-test. Step 1 of the algorithm uses I6/e 
samples. For every tuple (x, /3, a), step 2(c) of the algorithm uses O log samples. Summing 
over all tuples yields a sample complexity of 


/ 1 1 


i=i 






For the different tuples Test-equal(^)^, reuse samples and as aP = 

H(e/(log 1 /e)), it uses a total of 0 (^log^ Mog^). samples. 

Furthermore, Near-uniform-identity-test uses O ^^log^^ samples. Summing over all 

tuples, the sample complexity is O (^log^). Summing over all the three cases, the sample com¬ 
plexity of the algorithm is O (^ log^ ^ log . 


C Closeness testing proofs 


C.l Proof of Lemma 8 

Recall that Gi = {j : j > i}. Let rj = and Si = yye will use the following 

properties: Yli=i — 1 ) Yli=i — C*) a-iid Yli=i l'®*l — f • show that 




Si 

Ti 
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We show that 


2-^j=i ^3 

^ |si| -b |sa| - |si -b Sal 


Si + Sa 

\-^k 

Ej=i Sj 

k 

A_\ '' j.. 

Si 

\-^k 

^j=i ^3 

.^■11 

w 

" 2 

i-\f li-/ 2 ) 

ri -b ra 

T.3=1^3 

i=3 

ri 

E,=irj 


E 

Thus reducing the problem from k indices to fc — 1 indices with si, S 2 , • • • Sk going to si + S 2 , S 3 ,... Sk 
and r 1 , r 2 ,... Tfe going to r 1 + r + 2, ra, r 4 ,... . Continuing similarly we can reduce the k — \ indices 

to A; — 2 indices with terms si + S 2 + S 3 , S 4 ... and ri + r 2 + r 3 , r 4 ... r*, and so on. Telescopically 
adding the sum 


2 = 1 


E k 
j=i 


r- 

2^,- 1=1 '3 


^ |si| + |S 2 | - |si + S 2 I |si + S 2 I + |S 3 | - |si + 52 + S 3 I 


E k I I 

i=l \ £ 

2 “4’ 


where the last equality follows from the fact that Sj = 0. To prove the required inductive 

step, it suffices to show 


E^ 

2 = 1 


E k 
j=i 


E k 

j=i ^3 


> 


> 


•Si 


|S2| - |S1 + S 2 I 


Si + S2 + Si + S2 


+ (n + r2) 


E k 

7=1 ^3 


ri + ra 


where the last inequality follows from the fact that Si = 0. Rewriting the left hand side using 
the fact that Sj = 0 

Si 


|si| +r 2 


where r'^ = Thus it suffices to show 




r2 r2 + rg 


|si| + ra 


Si 


ra ra + rg 


> 


|si| + Isal + |si + Sal 


We prove it by considering three sub-cases: si, sa have the same sign, si,S 2 have different signs 
but |si| > jsal) and si,S 2 have different signs but |si| < jsal. If si,S 2 have the same sign, then 



Sa 



Sa 

si| + ra 


+ , 

> Si -b ra 



ra 

ra + rg 


ra 


= |si| -b Isal = 


|si| -b Isal -b |si -b Sal 


If Si and Sa have different signs and |si| > Isal, then 


|si| -b ra 




Si 


ra ra -b rg 


> |si| = 


|si|-b|si| |si|-b Isal-b |si-b Sal 


If Si and Sa have different signs and |si| < Isal, then 



S 2 , Si 


Sa Si 

si| + ra 

+ / 

> Si -b ra 



ra ra -b r^ 


ra ra 


= Si -b Sa -b Si = 


|si| “b Isal “b |si -b Sal 
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C.2 Proof of Lemma 10 


We prove this lemma using several smaller sub-results. We first state a concentration result, which 
follows from Bernstein’s inequality. 

Lemma 15. Consider a set G such that maxjgcr(j) < rmax- Consider set S formed by selecting 
each element from G independently and uniformly randomly with probability rg, then 

E[r(5)] = ror(G), 


and with probability > 1 — 25, 

|r(S') -£^[r(S')]| < ^2ror^a.xr{G) log ^ + rmaxlog^. 

Furthermore 

E[\S\]=r,\G\, 

and with probability > 1 — 25, 

IIS’! - ro|G|| < y^ 2 ro|G| log ^ log i. 

C.2.1 Results on Prune-set 

We now show that with high probability Prune-set never removes an element with probability 
< 2r{i). 

Lemma 16. The probability that the algorithm removes an element j from the set S such that 
r{j) < 2r(i) during step 2 of Binary-search is < 5/5. 

Proof. If r(j) < 2r(i), then < |. Applying Chernoff bound, 

Pr ^n(j) > 

Since the algorithm uses this step no more than O(niloglogfc) times, the total error proba¬ 
bility is less than 0(niloglog/c • Since ni is poly(logloglogfe, e“^,log<5“^) and n 2 = 

0(logloglogA: -|- log ^), the error probability is < 5/5. ■ 

We now show that Prune-set removes all elements with probability > 4r(i) with high proba¬ 
bility. Recall that 5' = 

Lemma 17. If element i isa-heavy, ft-approximable and rguess < ^ r{Gi) ’ Prune-set removes 
all elements such that r(j) > 4r(i) during all calls of step 2 of Binary-search with probability 
> 1 - 1 - 

Proof. Let A = {j : r{j) < 4r(i)} and S' = S C A. By Lemma 15, with probability >1 — 25' 
r{S') 

^ ^guess r{A) + 8rguessr{i)r{A) log ^ + 4r(i) log ^ < 2rguess + 8 r(i) log 
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where the last inequality follows from the identity y/2ab < a + h. Observe that < 4 ^^- Let 
S" = S\S'. By Lemma 15, with probability > 1 — 25' 


def 


z, |5"| < r 


1 


1 


1 


guess 


guess 


4?{i) + V Mi)y ^ 


+ 21 og 


1 


S has V elements with probability > 4r(i). Suppose we have observed j of these elements and 
removed them from S. There are y — j of them left in S. After taking another rj samples from 
S, the probability of not observing a (j + l)th heavy element is < {r{S')/{r{S') + Ar{i){v — 
Therefore, 


def 1 ^ 

hi = log ^ 


1 + 


r{S') \ ^ ^ofeF 

^r{i){u-j)) - log ('i + 


log 


samples suffice to observe an element from S" with probability > 1 — ^. After observing the sample 
(call it j), similar to the proof of Lemma 16 it can be shown that with probability > 1 — 5', for 
samples from j}., n(j) > 3 n 2/4 and hence j will be removed from S. Thus to remove all v 
elements of probability > 4r(i), we need to repeat this step 


ni 



r{S') \ 

4r{i)j) 


<iog|; 




times. Substituting r{S') and y in the RHS and simplifying we have 

^ 2^^ (2;^2^ + '^ ‘°4) ■ 

By the union bound, total error probability is < 5'. Since the number of calls to Prune-set is 
at most log log A; during step 2 of the algorithm, the error is at most log log A; • 26' <5/5 and the 
lemma follows from the union bound. ■ 


C.2.2 Proof of Lemma 10 

The proof of Lemma 10 follows from the following two sub-lemmas. In Lemma 18, we show that 
if Tguess > 1 r(^G') then step 4 will return heavy, and if rg^ess < ^ r{G ) hence the algorithm outputs 

light with high probability. Since we have log log A: iterations and ^ < r{G ) — !> reach 
< r-guess < at some point of the algorithm. 

Lemma 18. Ifvguess > i is a heavy, jS-approximable, and Prune-set has removed none 

of the elements with probability < 2r{i), then with probability > 1 — 45', step 4 outputs heavy. 

Proof. Let G'^ = Gi \ {i}. Since i is a heavy and /3-approximable, by convexity 


KG') 

p{i) - q{i) 

p{G'f) - q{G'f) 

> 

■ 

1 

• 

p{Gi) 

- g(G.) 


p{i) -F q{i) 

p{G'^) + q{G'f) 


p{i)+q{i) 

P(G^) 

+ Q(Gi) 
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Hence r[G[) > (5r{Gi)l2. By assumption, all the elements with probability < 2r{i) in set S will 
remain after pruning. Thus all the elements in set S from G[ remains after pruning. Let S' = G'n5. 
By Lemma 15 with G = G'^, r^ax = ri, and ro = rguess 


Pr 


^r(5') < rguess?’(G-) - y2rguessr(f)r(G') log - r(i) log < 26'. 


(3) 


Taking derivatives, it can be shown that the slope of the RHS of the term inside parenthesis 
is r{G'j) — ^ I ^ which is Dositive for > 


'yr{i) 


2rs- 


r-p-„f,ss = in the range 


guess - l3r(Gi) 


. Thus the value is minimized at 


'guess — i3r{Gi) 

get 


l3r(^G-) ’ simplifying this lower bound using values of 7,/3, we 


rguess?’(G') - y2rguess?’(f)r(G9log^ - r{i) log ^ 

Since Pr(X < b) < Pr(X < 6 + f), we have 

Pr(r(S')<^ 


< 26'. 


Hence with probability > 1 — 26', < 7- By the Chernoff bound, 

pTh!) > p ^ r(i) p 1 \ 

V ns 7/ \ 713 r{S) + r{t) 7/ 

Therefore, for > O ^7^ log ^ ^, step 3 outputs heavy with probability > 1 — 26'. By the 
union bound the total error probability < 4h' ■ 

Lemma 19. If rguess < ond Prune-set has removed all elements with probability > 4r(i) and 
none of the elements with probability < 2r{i), then with probability > 1 — A6', step 3, outputs light. 


Proof. The proof is similar to that of Lemma 18. By assumption all the elements have probability 
< 4r(i). By Lemma 15, 


Pr r(5) > r{i) 


' guess 


.logi + ^^+41og- 
r(f) r{i) ^ ^6' 


< 26'. 


Similar to the analysis after Equation (3), taking derivatives it can be shown that the RHS of the 
term inside parenthesis is maximized when rguess = for the range [0,1^]. Thus simplifying 
the above expression with this value of rguess and the value of 7, with probability > 1 — 26', 
r{S) < 7r(f)/10. Thus with probability > 1 — 26', 


r{i) 


> 


1 


> 


6 


By the Chernoff bound 
Pr 


ni(i) ^ 5 
ns “ 7 


r{S) + r{i) 1 + 7/10 7 

r{i) 


< Pr 


nipj 


< 


ns r(5) + r(z) 




The lemma follows from the bound on ns and by the union bound total error probability < A6'. 
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Note that the conditions in Lemmas 18 and 19 hold with probability > 1 — ^ by Lemmas 17 
and 16. Furthermore, since we use all the steps at most log log/c times, by the union bound, the 
conclusion in Lemma 10 fails with probability < ^ + log log k ■ 86' = ^ < 6. 

C.3 Proof of Theorem 11 

For the ease of readability, we divide the proof into several sub-cases. We first show that if p = q, 
then the algorithm returns same with high probability. Recall that for notational simplicity we 
redefine 5' = 32 minJi)iog\ogk - 

Lemma 20. If p = q, Closeness test outputs same with error probability < 6. 

Proof. Note that the algorithm returns diff only if any of the Test-Equals return diff . We call 
Test-Equal at most ^ • log log k-m - {n^ + 1) times. The probability that at any time it returns an 
error is < Thus by the union bound total error probability is < 6'IQ log log k-m - (n^ -|- l)/e < 6. 


We now prove the result when ||p — > e. We first state a lemma showing that Prune-set 

ensures that set S does not have any elements > 4r(i). The proof is similar to that of Lemmas 17 
and 16 and hence omitted. 


Lemma 21. Ifiis a-heavy and fl-approximable, then at any call of step 2 of Assisted-closeness- 
TEST, with probability > 1 — if r guess < r{G-) ’ Prune-test never removes an element 

with probability < 2r{i) and removes all elements with probability > 4r(i). 


The proof when ||p —(?||i > e is divided into two parts based on the probability of certain 

. Let D denote the event such that an element j from 


events. Let /3' = , P" = 

Gf with 


p(i)+g(i)’'“ 1287log- 


pU)-qU) _ al 

when Pr(H) > ^ and Pr(D) < 


> P" and r{j) < Ar{i) gets included in S. We divide the proof in two cases 
2 

1287- 


Lemma 22. Suppose \\p — g||;^ > e. If i is a-heavy and P-approximable, ^ < Vguess < r{G-) ’ 

conclusions in Lemma 21 hold , and Pr(L)) > then step 3(a) of Assisted-closeness-test 
returns diff with probability > 1/5. 


Proof, we then show that the following four events happen with high probability for at least one 

set S e{Si,S2...Sm}- 


• S includes a j such that 


p(i)-g(i) 

p{j)+g{j) 


P' 


> P",r{j) < Ar{i) , j Gi. 


. r{S) 

— Vguess + 8 -\/rJJ)rgPess- 


• j appears when S is sampled 77,4 times. 


• Test-equal returns diff. 
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Clearly, if the above four events happen then the algorithm outputs dif f. Thus to bound the error 
probability, we bound the error probability of each of the above four events and use union bound. 
Probability that at least one of the sets contain an element j such that 
4r(z) , j ^Gi is 

1 - (1 - Pr(D))™ > 1 - 

6 


p(j)-q(j) Q! \ an A\ ^ 
P{j)+q{j) ~ P -P 'GJ) < 


Let S' = {j ^ S ■. r{j) < 4r(z)}. Observe that before pruning E[r(S'')] < rguess and Var(r(5'')) < 
4r(i)rguess- Hence by the Chebyshev bound with probability > 1 — 1/16, 


r{S') < rguess + 


guess 5 


After pruning, r{S) contains only elements of probabilities from S'. Hence with probability > 
1 — 1/16, r{S) < Tguess + 8 -y/r(i)rguess- Probability that this element does appear when sampled n 4 
times is 


1 - 



> 1 - 


A _ ^(0 

Y ^ guess (1 + 8\AW ^ guess ) 


77-4 

> 1 - 


^V'>- 

9j J “ 6 ■ 


Since 


pU)-gU) 

P{j)+Q{j} 


P' 


> < r{j) < Ar{i) by Lemma 14 the chi-squared distance is 


> f on .2 r{j)r{i) ^ 

“ ^ 4{r{i)+r{j)y “ 25 


Thus by Lemma 1, algorithm outputs diff with probability 1 — 5'. By the union bound the total 
error probability < 1/6-1-1/16-|--|-l/6-|-(5'<4/5. ■ 

Lemma 23. Suppose \\p — gll^ > e. If i is a-heavy and fj-approximable, < rguess < r(G-) ’ 

conclusions in Lemma 21 hold , and Pr(L)) < then step 3(6) of Assisted-closeness-test 
returns diff with probability >1/5. 


We show that the following four events happen with high probability for at least some set 
-S G {5i, 52 ,..., Sm}. Let 5' = 5 n Gi and G'^ = Gi\ {i}. Let 

|/?^(p(50 + g(50)-(p(50-g(50)| 

2 

• ^ > rguess|(/3'(p(G') + q{G',)) - p{G'f) + g(G'))|/4. 

• r{S) < 8(r(i) -h rguess) log 

• Event D does not happen. 

• Test-equal outputs diff. 


Z = r(5') 


/ 3 '- 


p{S') - q{S') 


p{S') + q{S') 


Clearly if all of the above events happen, then the test outputs diff. We now bound the error 
probability of each of the events and use union bound. Since none of the elements in S' undergo 
pruning, the value of Z remains unchanged before and after pruning. Thus any concentration 
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inequality for Z remains the same after pruning. We now compute the expectation and variance of 
Z and use Paley Zigmund inequality. 


E[Z] = E[|/3'(p(5') + q{S')) - Pis') + qiS')\]/2 
> |E(/3'(p(S') + qiS')) - p{S') + q{S'))\/2 
= rg„ess|(/3'(p(G;) + g(G')) -p(G') + g(G'))|/2, 

where the inequality follows from convexity of | • | function. Let l{j,S') denote the event that 
j & S'. The variance is lower bounded as 

Var(Z) = E[Z2] - (E[Z])2 

= ni/3'ipiS') + qiS')) - Pis') + qiS'))^]/4 - E^[\/3'ipiS') + qiS')) - piS') + g(50|]/4 

< EiiP'ipiS') + qiS')) - Pis') + qiS'))^]/4: - E‘^[/3'ipiS') + qiS')) - p(S') + <?(S')]/4 
=Var(/3'(p(5') + qiS')) - piS') + qiS'))/4 

= S')il3'ipij) + qij)) - pij) + g(j))V4 

< Y, E[lij,S')]if3'ipij)+qij))-pij)+qij))^/4 

= Y ^guess(^'(p0') + qU)) -pij) + 9(j))V4 
j^G'^ 

< max 1/3'(p(/) + qij')) - pij') + qij')\ ■ r^^ess Y + ^0’)) “ pU) + 9(i)l/4 

^ 3GG' 

(c) 

< 4r(i) • rguess^(G*j). 

(a) follows from the bound on expectation. (6) follows from the independence of events l(j, S). (c) 
follows from the fact that pij) + qij) = 2r(j) < 2r(z), |/3'| < 1 and ^(j) ^ ''"i^'i)- Hence by the 
Paley Zygmund inequality, 


PriZ > rg,essl(/3'(p(G;) + qiG'J) -p(G') + qiG',))\/4) > Pr(Z > E[Z]/2) 

^ 1 E2[Z] 

- 4Var(Z) +E2[Z] 

1 E2[Z] 

“ 4 4r(G')r(i)rguess + E‘^[Z]' 

Since i is /3-approximable, by convexity 


pii) - qii) 

PiGd - qiG',) 

> 

■ 

1 

piGi) 

- qiGi) 

pii) +qii) 

p(G') + g(G') 


pii) + qii) 

piG^) 

+ qiGi) 


Hence, 


KG'') 


pji) - qji) 
pii) + qii) 


PjG'i) - qjG'^ 
P(G') + g(G9 


> r(G,)/3. 
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Thus K[Z] > rguessr{Gi)f3 and 


Pr(Z > rg,essl(/3'(p(G;) + g(G')) - p(G') + g(G'))|/4) > 


> 


1_ irguessr{Gi)P)‘^ _ 

4 4r(i)rguess?’(G') + {rguessr{Gi)/3)‘^ 

1 _ irguessr{Gi)(3)‘^ _ 

4 2 max(4r(i) 'l"guess'f'{G'j), ( 

^guess^ mm 

1 


= — min I 1, 


' guess' 


mw \ 


4r(i)rguess^(G’') J 


> 


> 


mm 

327 

a /32 

3^' 


(4) 


By Lemma 15, with probability > 1 — 


r{S) 

^ ^guess + \ 8rg^essr{i) log 


1287 

a(5‘^ 


+ 4r(z) log 


1287 

al3‘^ 


< 8(r(i) + 

^ guess )log 


1287 
af3'^ 


(5) 


Let S” = S \ S'. If event D has not happened then for all elements j € S", (3' — ^ P'‘ 

and hence 


p{i) - q{i) p{S") - q{S") 


p{i) + q{i) p{S”) + q{S") 
Combining the above set of equations, 


< /3" 


( 6 ) 


p{i) - q{i) 

p{i) + q{i) 


PjS) - q{S) 
p{S)+q{S) 


(/*) r{S') 

- m 

(b) Z 
> 


p{i) - q{i) 


r{S) 


p{i) +q{i) 
-/3" 


C) 2r(G')rguess 


> 


4r(5) 


/ 3 '- 


^ 1"{Gi)rguessP _ on 
2r{S) ^ 

W r{Gi)rguessl3 
- 8r{S) ■ 


piS')-q{S') r{S") 
piS')+qiS') r{S) 


p{G'^-q{G'^ \ _ „ 
p{Gi) + q{Gi) ) 


p{i) - q{i) 

p{i) + q{i) 


p{S") - qjS") 
p{S")+qiS") 


(a) follows from convexity and the fact that |a + 6| > |a| — |61, (6) follows from Equation (6), and 
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(c) follows from Equation (4). [d) follows from the fact that 


/3" = 


ajd 


1287 log 


< 


< 


/3 


128 log ^ 


a 

mm I —, a 


/3 


128 log 1281 


mm 




7 

r{Gi]r, 


guess 


r(i) 




r{Gi 


' guess 


/3 


64 log ^ • 2max(r(i),r, 


< 


< 


aP 

r(Gi )r 


guess J 


guess 


P 


641og ^(r(i) + r 


op 

f'{Gi)rguQssP 

8r(5) ■ 


guess j 


Thus by Lemma 14, chi-squared distance is lower bounded by 

r(Gi)rguess/3 \^ r{i)r{S) ^ ^guess/3^r(j)r^(G^) 

8r{S) J 4(r(i) + r(S'))^ 28r(S')(r(i) + r(S'))^ 

(«) ^"guess/^^^(^)^^(<^i) 

- 220(r(i)+rg,ess)3log3^ 

“220 - 8 max(r3(i), • log^ ^ 

r‘^{Gi)P‘^ . ( r{i) r^uess 

223 log3^ \r guess r^{i) 

W r^{Gi)P^ . (pr{Gi) r^{Gi) \ 

- 223 iog3^“'’"V 7 

^ a3^3 

- 223^2 log3^' 


(a) follows from Equation (5) and (6) follows from bounds on rguess- Thus with probability > 
1 — Test-equal outputs diff. By the union bound, the error probability for an 5 € 

cp , a/3^ , Q/9^ I a' ^ 1 0^/3^ 


{5i, S 2 , ■ ■ ■ S'm} is < 1 — -|- -|- -|- 5' < 1 — Since we are repeating it for m sets, the 

probability that it outputs diff is 


> 1 - 




a^'^m 

> 1 _ e 1287 



Theorem 11 follows from Lemma 20 for the case p = q- If ||p — (?||^ > e, then it follows from 9 
(finds a good tuple), 10 (finds good approximation of rguess), 21 (pruning), and 22 (Pr(H) is large), 
and 23 (Pr{D) is small). By Lemma 20 success probability when p = q is > 1 — 6. The success 
probability when ||p — g||i > e is at least the probability that we pick a good-tuple {i,P,a) and 
the success probability once a good tuple is picked (sum of errors in Lemmas 10, 21 -|- maximum 
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of errors in Lemmas 22 and 23) which can be shown to be 1/5 • (1/5 — 5 — 2(5/5) > 1/30. We now 
analyze the number of samples our algorithm uses. 

We first calculate the number of samples used by Assisted-closeness-test. Step 2 calls 
Prune-set m times and each time Prune-set uses nin 2 samples. Hence, step 2 uses mnin 2 
samples. Step 3(a) uses mre4 • 0{/3"~‘^) and step 3{b) uses m ■ 0{e~^). Hence, the total number of 
samples used by Assisted-closeness-test is 


mnin2+mnr0{P"-^)+m-0{e-^) = O + a" + a-i^-2g-3) = q (^-ig"^) . 


Thus each Assisted-closeness-test uses 0{e ^) samples. Hence, the number of samples 

used by Binary-search is 



Since Closeness-test calls Binary-search for 16/e different tuples. Hence, the sample com¬ 
plexity of closeness test is 



16 /e 


log log k 




31 





